Pesquisa | Portal Regional da BVS

Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer.

Wadia, Roxanne; Akgun, Kathleen; Brandt, Cynthia; Fenton, Brenda T; Levin, Woody; Marple, Andrew H; Garla, Vijay; Rose, Michal G; Taddei, Tamar; Taylor, Caroline.

JCO Clin Cancer Inform ; 2: 1-7, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30652545

RESUMO

PURPOSE: To compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer. METHODS: An NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected. A reference standard was created by the manual review of reports to determine if the text stated that follow-up was needed for concern for cancer. The NLP algorithm was applied to all reports and compared with case identification using the manual coding by the radiologists. RESULTS: A total of 450 reports representing 428 patients were analyzed. NLP had higher sensitivity and lower specificity than manual coding (77.3% v 51.5% and 72.5% v 82.5%, respectively). NLP and manual coding had similar positive predictive values (88.4% v 88.9%), and NLP had a higher negative predictive value than manual coding (54% v 38.5%). When NLP and manual coding were combined, sensitivity increased to 92.3%, with a decrease in specificity to 62.85%. Combined NLP and manual coding had a positive predictive value of 87.0% and a negative predictive value of 75.2%. CONCLUSION: Our NLP algorithm was more sensitive than manual coding of CT chest reports for the identification of patients who required follow-up for suspicion of lung cancer. The combination of NLP and manual coding is a sensitive way to identify patients who need further workup for lung cancer.

Assuntos

Codificação Clínica/métodos , Neoplasias Pulmonares/diagnóstico por imagem , Processamento de Linguagem Natural , Idoso , Algoritmos , Connecticut , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Sensibilidade e Especificidade , Tomografia Computadorizada por Raios X

Mathematical modelling of arsenic transport, distribution and detoxification processes in yeast.

Talemi, Soheil Rastgou; Jacobson, Therese; Garla, Vijay; Navarrete, Clara; Wagner, Annemarie; Tamás, Markus J; Schaber, Jörg.

Mol Microbiol ; 92(6): 1343-56, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24798644

RESUMO

Arsenic has a dual role as causative and curative agent of human disease. Therefore, there is considerable interest in elucidating arsenic toxicity and detoxification mechanisms. By an ensemble modelling approach, we identified a best parsimonious mathematical model which recapitulates and predicts intracellular arsenic dynamics for different conditions and mutants, thereby providing novel insights into arsenic toxicity and detoxification mechanisms in yeast, which could partly be confirmed experimentally by dedicated experiments. Specifically, our analyses suggest that: (i) arsenic is mainly protein-bound during short-term (acute) exposure, whereas glutathione-conjugated arsenic dominates during long-term (chronic) exposure, (ii) arsenic is not stably retained, but can leave the vacuole via an export mechanism, and (iii) Fps1 is controlled by Hog1-dependent and Hog1-independent mechanisms during arsenite stress. Our results challenge glutathione depletion as a key mechanism for arsenic toxicity and instead suggest that (iv) increased glutathione biosynthesis protects the proteome against the damaging effects of arsenic and that (v) widespread protein inactivation contributes to the toxicity of this metalloid. Our work in yeast may prove useful to elucidate similar mechanisms in higher eukaryotes and have implications for the use of arsenic in medical therapy.

Assuntos

Arsênio/metabolismo , Modelos Teóricos , Saccharomyces cerevisiae/metabolismo , Biotransformação , Inativação Metabólica

The effect of a lung cancer care coordination program on timeliness of care.

Alsamarai, Susan; Yao, Xiaopan; Cain, Hilary C; Chang, Bryan W; Chao, Herta H; Connery, Donna M; Deng, Yanhong; Garla, Vijay N; Hunnibell, Laura S; Kim, Anthony W; Obando, J Antonio; Taylor, Caroline; Tellides, George; Rose, Michal G.

Clin Lung Cancer ; 14(5): 527-34, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23827516

RESUMO

BACKGROUND: Timeliness of care improves patient satisfaction and might improve outcomes. The CCCP was established in November 2007 to improve timeliness of care of NSCLC at the Veterans Affairs Connecticut Healthcare System (VACHS). PATIENTS AND METHODS: We performed a retrospective cohort analysis of patients diagnosed with NSCLC at VACHS between 2005 and 2010. We compared timeliness of care and stage at diagnosis before and after the implementation of the CCCP. RESULTS: Data from 352 patients were analyzed: 163 with initial abnormal imaging between January 1, 2005 and October 31, 2007, and 189 with imaging conducted between November 1, 2007 and December 31, 2010. Variables associated with a longer interval between the initial abnormal image and the initiation of therapy were: (1) earlier stage (mean of 130 days for stages I/II vs. 87 days for stages III/IV; P < .0001); (2) lack of cancer-related symptoms (145 vs. 60 days; P < .0001); (3) presence of more than 1 medical comorbidity (123 vs. 82; P = .0002); and (4) depression (126 vs. 98 days; P = .029). The percent of patients diagnosed at stages I/II increased from 32% to 48% (P = .006) after establishment of the CCCP. In a multivariate model adjusting for stage, histology, reason for imaging, and presence of primary care provider, implementation of the CCCP resulted in a mean reduction of 25 days between first abnormal image and the initiation of treatment (126 to 101 days; P = .015). CONCLUSION: A centralized, multidisciplinary, hospital-based CCCP can improve timeliness of NSCLC care, and help ensure that early stage lung cancers are diagnosed and treated.

Assuntos

Adenocarcinoma/terapia , Carcinoma Pulmonar de Células não Pequenas/terapia , Carcinoma de Células Escamosas/terapia , Comportamento Cooperativo , Neoplasias Pulmonares/terapia , Garantia da Qualidade dos Cuidados de Saúde/métodos , Adenocarcinoma/diagnóstico , Adenocarcinoma/mortalidade , Idoso , Carcinoma Pulmonar de Células não Pequenas/diagnóstico , Carcinoma Pulmonar de Células não Pequenas/mortalidade , Carcinoma de Células Escamosas/diagnóstico , Carcinoma de Células Escamosas/mortalidade , Gerenciamento Clínico , Feminino , Seguimentos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/mortalidade , Masculino , Estadiamento de Neoplasias , Prognóstico , Estudos Retrospectivos , Taxa de Sobrevida , Fatores de Tempo , Veteranos

Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management.

Garla, Vijay; Taylor, Caroline; Brandt, Cynthia.

J Biomed Inform ; 46(5): 869-75, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23845911

RESUMO

OBJECTIVE: To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance. BACKGROUND: The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size. In contrast, electronic medical record (EMR) systems contain hundreds of thousands of unlabeled notes that are not used by supervised machine learning approaches. Semi-supervised learning algorithms use both labeled and unlabeled data to train classifiers, and can outperform their supervised counterparts. METHODS: We trained support vector machines (SVMs) and Laplacian SVMs on a training reference standard of 820 abdominal CT, MRI, and ultrasound reports labeled for the presence of potentially malignant liver lesions that require follow up (positive class prevalence 77%). The Laplacian SVM used 19,845 randomly sampled unlabeled notes in addition to the training reference standard. We evaluated SVMs and Laplacian SVMs on a test set of 520 labeled reports. RESULTS: The Laplacian SVM trained on labeled and unlabeled radiology reports significantly outperformed supervised SVMs (Macro-F1 0.773 vs. 0.741, Sensitivity 0.943 vs. 0.911, Positive Predictive value 0.877 vs. 0.883). Performance improved with the number of labeled and unlabeled notes used to train the Laplacian SVM (pearson's ρ=0.529 for correlation between number of unlabeled notes and macro-F1 score). These results suggest that practical semi-supervised methods such as the Laplacian SVM can leverage the large, unlabeled corpora that reside within EMRs to improve clinical text classification.

Assuntos

Administração de Caso , Neoplasias/terapia , Máquina de Vetores de Suporte , Algoritmos , Humanos

Knowledge-based biomedical word sense disambiguation: an evaluation and application to clinical document classification.

Garla, Vijay N; Brandt, Cynthia.

J Am Med Inform Assoc ; 20(5): 882-6, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23077130

RESUMO

BACKGROUND: Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification. METHODS: We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus. RESULTS: Our system compared favorably with other knowledge-based methods. Machine learning classifiers trained on disambiguated concepts significantly outperformed those trained using all concepts. CONCLUSIONS: We developed a WSD system that achieves high disambiguation accuracy on standard biomedical WSD datasets and showed that our WSD system improves clinical document classification. DATA SHARING: We integrated our WSD system with MetaMap and the clinical Text Analysis and Knowledge Extraction System, two popular biomedical natural language processing systems. All codes required to reproduce our results and all tools developed as part of this study are released as open source, available under http://code.google.com/p/ytex.

Assuntos

Mineração de Dados/métodos , Bases de Conhecimento , Processamento de Linguagem Natural , Unified Medical Language System , Inteligência Artificial , Literatura , Medical Subject Headings , Semântica

Semantic similarity in the biomedical domain: an evaluation across knowledge sources.

Garla, Vijay N; Brandt, Cynthia.

BMC Bioinformatics ; 13: 261, 2012 Oct 10.

Artigo em Inglês | MEDLINE | ID: mdl-23046094

RESUMO

BACKGROUND: Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks. RESULTS: We evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/. CONCLUSIONS: Knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.

Assuntos

Bases de Conhecimento , Medical Subject Headings , Semântica , Systematized Nomenclature of Medicine , Unified Medical Language System , Processamento de Linguagem Natural

Ontology-guided feature engineering for clinical text classification.

Garla, Vijay N; Brandt, Cynthia.

J Biomed Inform ; 45(5): 992-8, 2012 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-22580178

RESUMO

In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification. We evaluated these methods on the 2008 Integrating Informatics with Biology and the Bedside (I2B2) obesity challenge. The methods we developed improve upon the results of this challenge's top machine-learning based system, and may improve the performance of other machine-learning based clinical text classification systems. We have released all tools developed as part of this study as open source, available at http://code.google.com/p/ytex.

Assuntos

Algoritmos , Processamento de Linguagem Natural , Doenças Cardiovasculares , Mineração de Dados , Bases de Dados como Assunto/classificação , Humanos , Aplicações da Informática Médica , Modelos Teóricos , Obesidade , Semântica , Unified Medical Language System

The Yale cTAKES extensions for document classification: architecture and application.

Garla, Vijay; Lo Re, Vincent; Dorey-Stein, Zachariah; Kidwai, Farah; Scotch, Matthew; Womack, Julie; Justice, Amy; Brandt, Cynthia.

J Am Med Inform Assoc ; 18(5): 614-20, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21622934

RESUMO

BACKGROUND: Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges. METHODS: The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers. The authors describe and evaluate their system, the Yale cTAKES Extensions (YTEX), on the classification of radiology reports that contain findings suggestive of hepatic decompensation. RESULTS AND DISCUSSION: The F(1)-Score of the system for the retrieval of abdominal radiology reports was 96%, and was 79%, 91%, and 95% for the presence of liver masses, ascites, and varices, respectively. The authors released YTEX as open source, available at http://code.google.com/p/ytex.

Assuntos

Mineração de Dados , Sistemas de Apoio a Decisões Clínicas , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão , Connecticut , Mineração de Dados/classificação , Sistemas de Apoio a Decisões Clínicas/classificação , Registros Eletrônicos de Saúde/classificação , Humanos , Falência Hepática/diagnóstico por imagem , Reconhecimento Automatizado de Padrão/classificação , Radiografia , Sistemas de Informação em Radiologia/classificação

MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions.

Garla, Vijay; Kong, Yong; Szpakowski, Sebastian; Krauthammer, Michael.

Bioinformatics ; 27(3): 416-8, 2011 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-21149339

RESUMO

MOTIVATION: Next-generation sequencing technologies enable the identification of sequence variation in the genome and transcriptome. Differences between the reference genome and transcript libraries complicate the determination of the effect of genomic sequence variants on protein products; similarly, these differences complicate the mapping of sequence variants found in transcripts to their respective genomic position. We have developed MU2A, a publicly available web service for variant annotation that reconciles differences between the genome and transcriptome, enabling the rapid and accurate determination of the effects of genomic variants on protein products, and the mapping of variants detected in transcripts to genomic coordinates. The MU2A web service is available at http://krauthammerlab.med.yale.edu/mu2a. We have released MU2A as open source, available at http://code.google.com/p/mu2a/.

Assuntos

Variação Genética , Genoma , Anotação de Sequência Molecular/métodos , Transcriptoma , Humanos , Internet , Mutação , Neoplasias/genética , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA